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Abstract — Secure data compression in the presence of side 
information at both a legitimate receiver and an eavesdropper is 
explored. A noise-free, limited rate link between the source and 
the receiver, whose output can be perfectly observed by the eaves- 
dropper, is assumed. As opposed to the wiretap channel model, in 
which secure communication can be established by exploiting the 
noise in the channel, here the existence of side information at the 
receiver is used. Both coded and uncoded side information are 
considered. In the coded side information scenario, inner and 
outer bounds on the compression-equivocation rate region are 
given. In the uncoded side information scenario, the availability of 
the legitimate receiver's and the eavesdropper's side information 
at the encoder is considered, and the compression-equivocation 
rate region is characterized for these cases. It is shown that the 
side information at the encoder can increase the equivocation rate 
at the eavesdropper. Hence, the side information at the encoder 
is shown to be useful in terms of security; this is in contrast with 
the pure lossless data compression case where side information 
at the encoder would not help. 

I. Introduction 

Consider a sensor network in which multiple sensors ob- 
serve an underlying phenomenon that needs to be recon- 
structed at an access point. While some sensors might have 
secure (possibly wired) connections to the access point, others 
might be transmitting over the wireless medium, which can 
be accessed by an adversary trying to obtain information 
about the underlying phenomenon. Furthermore, this adversary 
might have its own observation of the main source. Our goal is 
to explore the security issues in this sensor network scenario. 
Our model is a simplified version of the general problem, 
in which we assume a single sensor (Alice) having direct 
access to the underlying source that needs to be transmitted 
to the access point (Bob) reliably and securely. Furthermore, 
we assume an idealized noise-free channel whose output can 
also be observed by the eavesdropper (Eve). 

If no side information is available to Bob, then we can- 
not achieve any level of security. However, if we assume 
the existence of a nearby sensor (Charlie) having access to 
correlated side information about Alice's source and a secure 
limited-rate link to Bob, this sensor might enable secure 
transmission of Alice's source using its own secure link (see 
Fig. [T). Our goal is to characterize the capacities of error-free 
communication links from Alice and Charlie to Bob such that 
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Fig. 1, Side information of Bob is provided by Charlie who has access to 
his own correlated side information. 



Alice's information can be reliably transmitted to Bob, while 
keeping Eve's information about the source limited. 

Secure communication over noisy channels in the presence 
of a wiretapper has recently attracted considerable interest. 
Information theoretic security in this context is defined through 
the equivocation rate at the wiretapper, which can be roughly 
defined as the uncertainty of the wiretapper about the message 
after observing the channel output. In his pioneering work [1], 
Wyner introduced the wire-tap channel, and showed that it 
is possible to transmit at a positive rate with perfect secrecy, 
assuming the wiretapper's channel is physically degraded with 
respect to the receiver. Later, Wyner's analysis is extended to 
more general broadcast channels in [2], which characterizes 
the capacity-equivocation rate region. Various extensions of 
the wiretap channel model to multiuser scenarios and fading 
channels have recently been investigated [3], [4], [5]. 

In the wiretap channel model, the potential for secure com- 
munication arises from the fact that the intended receiver has a 
better quality communication channel than the wiretapper [2]. 
In our model, since the communication channels are not noisy, 
the techniques of [2] do not apply; however, it is still possible 
to achieve security when Bob has higher quality side informa- 
tion than Eve as in [6], [7]. In [6], Merhav proved a source- 
channel separation theorem for the wiretap channel assuming 
both the channel and the side information of the wiretapper are 
physically degraded. Recently, Prabhakaran and Ramchandran 
[7] consider the arbitrarily correlated side information case 
focusing only on the leakage rate to the eavesdropper. They 
find the minimum leakage rate, and through an example, argue 



that the availability of Bob's side information to Alice might 
increase Eve's uncertainty about Alice's source. Secure com- 
pression of two correlated sources is considered in [10], where 
the eavesdropper has access to only one of the compressed bit 
streams. Our work is also closely related to the secret key 
capacity model of [8], [9], where correlated sources are used 
for secure key generation. However, our goal here is not to 
generate a secret key among Alice and Bob. Instead, we wish 
to communicate Alice's source to Bob securely. 

In this paper, we first consider the case in which the side 
information of Bob is provided by Charlie over a noise-free 
secure channel. After giving inner and outer bounds for the 
set of achievable compression-equivocation rates for this setup, 
we focus on the case in which Charlie-Bob link has enough 
capacity for Bob to obtain Charlie's side information loss- 
lessly. For this scenario, which also corresponds to uncoded 
side information, we consider cases in which either or both 
Bob's and Eve's side information may be available to Alice. 
We show that, in the secure compression model, as opposed to 
the usual lossless compression where side information at the 
encoder does not improve the performance, the availability of 
side information to Alice has the potential of improving the 
secrecy performance. We generalize the characterization of the 
achievable compression and equivocation rates to all the side 
information cases and provide illustrative examples. 

II. System Model 

We assume that Alice has access to an A^-length source 
sequence A N , which she wants to transmit to Bob reliably 
over a noise-free, finite capacity channel. Alice's transmission 
will also be perfectly received by an eavesdropper called Eve. 
We assume that Eve has her own correlated side information 
E N . On the other hand, a helper, called Charlie, has access 
to correlated side information C N and a limited rate secure 
channel to Bob (see Fig. [T). We model A N , C , and E as 
being generated independent and identically distributed (i.i.d.) 
according to the joint probability distribution pA,c,E(a,c,e) 
over the finite alphabet A x C x £. While Alice wants to trans- 
mit her source reliably to Bob, she also wants to maximize the 
equivocation at Eve, which represents the uncertainty of Eve 
about A N after receiving Alice's transmission and combining 
with her (Eve's) own side information E N . 

An (Ra, Rc, N) code for secure source compression in this 
setup is composed of an encoding function at Alic^l /a : 
A N — > {1,2,...,2 NRa }, an encoding function at Charlie, 
f c : C N -> {1,2, . . . ,2 NR c), and a decoding function at 
Bob, g N : {1, 2, ... , 2^} x {1, 2, . . . , 2 NRc } -» A N . 

The equivocation rate of this code is defined as 

±H(A N \f A (A N ),E N ), (1) 
and the error probability of the code has the usual definition: 

P e N = P(g(f A (A N )J c (C N )) ± A N ). (2) 

1 To keep the presentation simple, here we assume deterministic coding, but 
similar to [8], randomized coding can be considered by assuming that Alice, 
Bob and Charlie initially generate independent random variables and keep the 
rest of the coding scheme deterministic. Proofs would follow similarly. 



Definition 2.1: We say that (Ra, Rc, A) is achievable if, 
for any e > 0, there exist an (Ra, Rc, N) code such that 

H(A N \f A (A N ),E N ) > NA and P L N < e. 

III. Coded and Uncoded Side Information at Bob 

In this section, we give inner and outer bounds to the set of 
all achievable (Ra, Rc, A) triplets. In general, these bounds 
do not match. 

Theorem 3.1: For the setup above, (Ra, Rc, A) is achiev- 
able if, 

Ra > H(A\V), (3) 
Rc > I(C;V), (4) 
A < msut{I(A;V\U)-I(A;E\U)}, and (5) 
R A + A > H(A\E), (6) 

where we maximize over auxiliary random variables V and 
U that come from the joint distribution p(a,c,e,u,v) = 
p(a, c, e)p(u\a)p(v\c) with \U\ < \A\ + 1 and | V| < \C\ + 2. 

Conversely, if (Ra, Rc, A) is achievable, then d3j-(|6]l hold 
for some auxiliary random variables V and U for which V — 
C - (A, E) and U — A - (C, E) form Markov chains. 

Proof: The proof is given in Appendix H] ■ 

We can consider this problem to be a generalization of 
source coding with coded side information [11], where we 
have the security constraint in addition to lossless compres- 
sion. In the achievability of the inner bound given in Appendix 
H] Alice's encoder, instead of directly binning its observation 
with respect to the coded side information at Bob, uses an 
auxiliary codebook generated by U to send her observation and 
creates higher equivocation at Eve. This auxiliary codebook 
generation resembles lossy source coding with coded side 
information [12] for which the single letter characterization 
of the rate region remains to be an open problem. Similar to 
the inner and outer bounds for that problem [13], our inner 
and outer bounds differ in the joint distribution of the auxiliary 
random variables. 

A special case of the above theorem is obtained when we 
assume that Rc > H(C), that is, the side information C N of 
Charlie can be recovered by Bob with an arbitrarily small 
probability of error. In this scenario, in order to keep the 
presentation simple, we can assume that a side information 
sequence B N is available directly to Bob where B N = C N 
with high probability (see Fig. [2] with both switches open). For 
this uncoded side information case, the decoding function at 
Bob is replaced by g N : {1, 2, . . . , 2^^} x B N -> A N . The 
achievability is now defined similarly, for an (Ra,A) pair. 

We have the following corollary which follows from The- 
orem 13.11 The proof of this special case (assuming no rate 
limitations between Alice and Bob) is also given in [7]. 

Corollary 3.2: For uncoded side information B N at Bob, 
(Ra, A) is an achievable rate-equivocation pair if and only if, 

Ra > H(A\B), and (7) 
A < max{I(A;B\U) - I(A;E\U)}, (8) 



where the last inequality is due to the less noisy assumption. 
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Fig. 2. Uncoded side information at Bob. The states of switches Sb and 
Se model different scenarios in terms of the side information at the encoder. 



where we maximize over auxiliary random variables U such 
that U - A- (B,E) form a Markov chain and \U\ < \A\ + 1. 

While Corollary 13.21 requires an auxiliary codebook gen- 
erated by U in the general case to conceal the source from 
the eavesdropper, it is sometimes possible that the ordinary 
Slepian-Wolf binning achieves the highest possible security in 
terms of equivocation, i.e., (HJ is maximized by a constant U. 
Some definitions are in order. 

Definition 3.1: We say that the side information B is less 
noisy than the side information E if 



I(U;E)<I(U;B) 



(9) 



for every probability distribution of the form p(a, b, e, u) = 
p(a,b, e)p(u\a). 

Definition 3.2: Side information E is said to be physically 
degraded with respect to B if, A — B — E form a Markov 
chain. We say E is stochastically degraded with respect to B 
if, there exists a joint probability distribution p A be such that 
Pab = Pab, Pae ~ Pae, and A — B — E is a Markov chain. 

The less noisy condition is strictly weaker than the stochasti- 
cally degraded condition [14]. Furthermore, the compression- 
equivocation rate region depends on the joint distribution 
Pabe only via its marginals pab and pae- Hence, physical 
degradation and stochastic degradation are equivalent in this 
scenario. 

Corollary 3.3: For uncoded side information at Bob, if Bob 
has less noisy side information than Eve, then an (Ra , A) pair 
is achievable if and only if 

R A > H(A\B), and (10) 

A < I(A; B) - I(A; E). (11) 
Proof: Achievability follows simply by letting U be 
constant in Corollary 13.21 For the converse, consider any U 
with the joint distribution p(u, a, b, e) = p(a, b, e)p(u\a). We 
have 

[I(A; B) - I(A; E)} - [I(A; B\U) - I(A; E\U)] 
= [I(A;B)-I(A;E)} 

- [I(A, U; B) - I(B; U) - I(A, U; E) + I(E; U)} (12) 
= I(B;U\E) - I(E;U\B) (13) 
= I{B\ U) - I(E; U) > 0, (14) 



Corollary 13.31 for the special case of physically degraded 
side information at Eve is given in [6] as well. The following 
corollary, which we state without proof, gives a condition 
under which no positive equivocation can be achieved. 

Corollary 3.4: If Bob's side information is a stochastically 
degraded version of Eve's side information, then no positive 
equivocation rate is achievable, and A = 0. 

We use the following simple example (suggested in [7]) to 
illustrate some of our results. Let the original source sequence 
A N — (Ai, . . . , An) available to Alice be an i.i.d. binary 
sequence of A; ~ Bernoulli(l/2) random variables. The 
observation of Bob B N = (Bi,...,Bn) is generated by 
independently erasing each element of the A N sequence with 
probability ps, that is, Bi = Ai with probability 1 — pb, and 
Bi = e with probability ps- Similarly, the observation E = 
(Ei,..., En) of the eavesdropper Eve is an independent 
erased version of A N . We have Ei = Ai with probability 
1 — pe, and Ei — e with probability pe- 

For pe > Pb, the side information of Eve is a stochasti- 
cally degraded version of the side information of Bob. Using 
Corollary 13.31 we know that a constant U is optimal. Then, 
the optimal equivocation is A = I(A; B) — I(A; E) = 
(1 - p B ) - (1 - Pe) = Pe - Pb- 

When ps > Pe, then B N is a stochastically degraded 
version of E . From Corollary 13.41 we get A = 0. 

IV. Side information available to Alice 

In this section, we consider various cases in which Alice 
also has access to the side information available to Bob and/or 
Eve. We know from the Slepian-Wolf source coding that, 
the availability of Bob's side information at Alice does not 
help in terms of compression rates. However, as shown in [7] 
via a simple example, in the secure compression setup, the 
availability of B N at Alice potentially enables higher equiv- 
ocation rates at the eavesdropper. In the following theorem, 
we characterize the compression-equivocation rate regions for 
various side information scenarios at Alice. 

Theorem 4.1: Consider secure source compression for un- 
coded side information at Bob as illustrated in Fig. |2] An 
(Ra, A) pair is achievable if and only if 

Ra > H(A\B), and (15) 
A < max{I(A;B\U) - I(A;E\U)}, (16) 

where we maximize over auxiliary random variables U such 
that the joint distribution p(u, a, b, e) is given in the following 
table depending on which switches are closed: 



Closed Switches 


p(u,a,b,e) 


S B 
Se 
Sb and Se 


p(a,b, e)p(u\a,b) 
p(a,b, e)p(u\a,e) 
p(a, b, e)p(u\a, b, e) 



In the case when only the switch Se is closed, the rate 
region can be explicitly given as follows. 

Ra > H(A\B) and A < I(A; B\E). (17) 

Proof: The proof resembles Theorem 13. II and will not 
be included due to space limitations. ■ 

Note that the availability of either or both of the side 
information sequences at the transmitter enlarges the space 
of the auxiliary random variables U and potentially results in 
a higher equivocation rate at the eavesdropper. To illustrate 
this, consider the random erasure side information example 
in Section [III] Suppose that the observation of Bob B N is 
available to Alice as well. Alice can transmit only the erased 
bits of Bob, hence leaking the least amount of information to 
Eve. As stated in [7], it is possible to show that the optimal 
auxiliary random variable U satisfies U = A when there is 
an erasure at Bob, and U is constant otherwise. The optimal 
equivocation rate in this caseQ is A = p E {\ — p B ). Note that 
this equivocation is strictly larger than the one without side 
information. Furthermore, even if Bob's side information is a 
stochastically degraded version of Eve's, i.e., ps > Pe, we 
are still able to achieve a non-zero equivocation rate if this 
side information can be provided to Alice as well. 

When only the observation of Eve, E N is available to Alice, 
from (fTTT i the optimal equivocation rate is given by I(A; B\E). 
In the erasure example, the optimal equivocation rate is found 
to be A = —ps), which is the same as in the case when 
only switch Sb is closed. We observe that, for this specific 
example of erased observations at Bob and Eve, the benefit of 
having either Bob's or Eve's side information to Alice is the 
same. For this example, it is also possible to show that, even 
when both observation sequences are available to Alice, the 
optimal equivocation rate is still A = p E (l — Pb)- 

While there is no difference between physically or stochas- 
tically degraded observations when both switches are open, 
this is no longer true when we consider side information 
at Alice. In the following corollary, we show that for a 
physically degraded observation at Eve, the availability of E 
to Alice does not help. This is in contrast to stochastically 
degraded side information E whose availability at Alice 
would potentially increase the equivocation rate as seen in 
the example above. 

Corollary 4.2: If the observation of Eve is a physically 
degraded version of Bob's side information, i.e., A — B — E 
form a Markov chain, then providing this observation to Alice 
would not improve the equivocation rate. 

V. Conclusion 

We have considered secure lossless compression in the 
presence of an eavesdropper with correlated side information. 
We have shown that secure communication can be enabled by 
another agent who has its own correlated side information and 
a secure link to the legitimate receiver. We have studied scenar- 
ios under which secure compression codebooks are identical 

2 There is a typo in the leakage rate of 1 — pyPz reported in [7]. It should 
have been 1 — pz — PyPz- 



to Slepian-Wolf codebooks. We have also characterized the 
compression-equivocation rate regions considering availability 
of side information at the encoder. We have shown that, 
while it is useless in the pure lossless compression setup, 
side information at the encoder may help to increase the 
equivocation rate in secure compression model. 

Appendix I 
Proof of Theorem 13. II 

Inner bound: We fix p(u\a) and p(v\c) satisfying the 
conditions in the theorem. Then we generate 2 Ar ( / (" 4;C/ ' +Cl ' 
independent codewords of length N, U N (wi), wi <E 
{1, . . . , 2 JV ( / (^^)+ £ i)}, with distribution ]jf =1 p(u-). We ran- 
domly bin all U N (w 1 ) sequences into 2 N( - I( - A ' U ^ +e2 '> bins, 
calling them the auxiliary bins. For each codeword U N (wi), 
we denote the corresponding auxiliary bin index as a(wi). 
On the other hand, we randomly bin all A N sequences into 
2N(H(A\v,u)+e 3 ) b mSi ca ui n g them the source bins, and denote 
the corresponding bin index as s(A N ). We also generate 
2 N(i(C;V)+t i ) independent codewords V N (w 2 ) of length N, 
w 2 € {1, . . . , 2 A, ( / ( C ' y )+^)}, with distribution ]]f =1 pfa). 

For each typical outcome of A N , Alice finds a jointly 
typical U N (wi). Then she reveals a(wi), the auxiliary bin 
index of U N (wi), and s(A N ), the source bin index of A N , 
to both Bob and Eve, that is, the encoding function Ja of 
Alice is composed of the pair (a(wx), s(A )), Using standard 
techniques, it is possible to show that we have such a unique 
index pair with high probability. 

The helper, Charlie, observes the outcome of its source G , 
finds a jointly typical V N with C , and sends the index w% 
of V N over the private channel to Bob. With high probability 
C N will be a typical outcome, and there will be a unique 
V N (1U2) that is jointly typical with C N . Bob, having access to 
V N (W2) and the auxiliary bin index a(wi), can find the jointly 
typical U (w\) correctly with high probability. Then using 
V N (w2), U N (wi) and the source bin index s(A ), Bob can 
reliably decode the source sequence A N . Letting ej — + for 
i = 1,2,3 and 4, we can make the total communication rate of 
Alice arbitrarily close to I(A; U\V) + H(A\U, V) = H(A\V), 
while having an error probability less than e for sufficiently 
large 7Y. 

The equivocation rate for this scheme can be found as 

±H(A N \a( Wl ),s(A N ),E N ) 

= ±[H(A N )-I(A";a( Wl ),s(A N ),E N )] 

= ±[H(A N )-I(A";a(w 1 ),E") 

- I(A N ;s(A N )\E N ,a(w 1 ))] 

> 1 [H (A N ) - I(A N ; U N , E N ) - H(s(A N ))] (18) 

= H(A\U,E)-H(A\V,U)-e 3 (19) 
= I(A;V\U)-I(A;E\U)-e 3 , 

where ( TT8l follows form the data processing inequality; and 
(l% follows form the fact that s(A N ) is a random variable 



over a set of size 2 N ^ H ^ v ^ +e3 \ 
Finally, we also have 
1 



jv 



N 



H(A"\a( Wl ),s(A lv ),E") 



1 

N 



[H(A N \E N ) - I(A N ;a( Wl ), s(A N )\E N )] 



1 



>H(A\E)--H(a( Wl ) lS (A N )) 



(20) 

>H{A\E)-R A . (21) 

Outer bound: Let J = f A (A N ) and K = fc(C N ). From 
Fano's inequality, we have H{A N \J,K) < N5(P^), where 
5{x) is a non-negative function with lim^^o S(x) = 0. 

Define U t = (J, A*' 1 , E^ 1 ) and V, = (K,^' 1 ). Note 
that both Ui-Ai—(Bi,Ei) and V l -C l -{A U Ei) form Markov 
chains. Then, we have the following chain of inequalities: 

JV 

NR C >H{K) > I{C N ; K) = ^ I(C,-K, C^ 1 ) (22) 



N 



where ( l22b follows from the chain rule of mutual information 
and the memory less assumption on C L . We also have 



NR A >H{J) > H(J\K) 

=H(A N , J\K) — H(A N \J, K) 
>H{A N \K) - Ne 



(23) 



JV 



= Y J H{A i \K,A i ~ 1 )-Ne 

i=l 
N 

> H(Ai\K, C i_1 ) - Ne (24) 

i=l 
N 

= Y,H{A l \K,C l - 1 )~Ne (25) 



i=l 



where (1231 follows from Fano's inequality and nonnegativity 
of entropy; (fHi follows as A4 - {K, A 1 ^ 1 ) - C 1 ^ 1 form a 
Markov chain; and (f25]) follows as A4 - (K, C i_1 ) - A^ 1 
form a Markov chain. 

Finally, we can also obtain 

H(A N \J,E N ) = H{A N \J)-I{A N -E N \J) 
= H{A N \J, K) + I{A N - K\J) - I{A N - E N \ J) 

N 

= Y,I(A i ;K\J,A i - 1 )-H(E i \J,E i - 1 ) 



H(E N \A N ,J) + Ne 



(26) 



JV 



< ^/(A^IJ,^" 1 ,^- 1 ) - HiE^lE 1 - 1 ,^- 1 ) 



H{E N \A N ) + Ne 



<'£[l(A i ;K,C i - 1 \J,A i - 1 ,E i - 1 ) 



H(E t \J, S* -1 , + H(Ei\Ai)] +Ne (28) 



/v 



= [I{Ai;Vi\Ui) - H(Ei\Ui) + H(Ei\Ai)} + Ne (29) 



JV 



= Y,[I(A i ;V i \U i )-I(A i ;E i \U i )]+Ne 



(27) 



(30) 



where ( |26] l follows from the Fano's inequality and the chain 
rule of mutual information; d27b follows from the memoryless 
property of the source and the side information sequences, and 
the fact that conditioning reduces entropy; d28b follows from 
the chain rule and non-negativity of mutual information; d29l ) 
follows from the definitions of Vi and Ui given above and 
the fact that conditioning reduces entropy; ( f30b follows since 
Ui - Ai - E^ 

We define an independent random variable Q uniformly 
distributed over the set {1, 2, . . . , N}, and A = Aq, E = 
E Q , V = [Vq,Q), and U = (U Q ,Q). Then from the 
usual techniques, (O-© follow while V — C — (A, E) and 
U — A — (C, E) are Markov chains. Finally, we also have 

±H(A N \E N )<±H(A N ,J\E N ) 

= ±[H(J\E N ) + H(A N \E N ,J)] 

<^ + A<R A + A. 
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